Twitter API Design Decisions

Learn the different design considerations in Twitter's API and the reasons behind them.

This lesson covers the design considerations that will help us design an effective API for a service like Twitter. Before getting into the details of those design decisions, we will need to understand the end-to-end architecture of Twitter. Let's discuss the workflow of how Twitter APIs work.

Design overview#

The following illustrations demonstrate a high-level view of how users are able to interact with the back-end services. Some of Twitter's functional requirements (that we defined in the previous lesson) have already been covered in this course. Now, we are interested in the creation process of Tweets and the generation of timelines for different users. The Tweet service allows the creation and storage of new Tweets, whereas the timeline service generates the timeline by interacting with multiple components, such as timeline generation, trends, people discovery, and ads. These services return the stream of Tweets, trends, suggestions for accounts to follow, and ads for different products to the timeline service. Moreover, the timeline service can return three different types of timelines (depending on the user's request), which are as follows:

  • Home timeline: This consists of the most recent Tweets of its followers with recommendations (ads, trends, and people to follow).

  • User timeline: This consists of the Tweets, Retweets, and recommendations of a specific user.

  • Mentions timeline: This returns the top N Tweets in which the logged-in user (@username) is tagged.

Let's see the details of how Tweets are sent to the system and how it generates a timeline.

The end-to-end architecture of the timeline and Tweet services
The end-to-end architecture of the timeline and Tweet services

Let's discuss the role of each component and service shown in the above diagram.

Components and Services Details

Component or Service

Details

Tweet service

  • Stores Tweet (up to 280 characters) in a database
  • Coordinates with timeline service in creating up-to-date timelines


Timeline service

  • Extracts information on active users' followers from the database to get their Tweets and ranks them
  • Combines Tweets, trends, accounts, and ads into a single response to generate a complete timeline
  • Performs pagination to reduce the burden on the client and backend

Feed cache

  • Acts as distributed cache like Redis for storing the timeline of active users

Trends

  • Generates trends on the basis of user location


Pub-sub

  • Notifies end-users of any activity happening in their circle—for example, a new Tweet from a celebrity they are following
  • Decouples Tweet service from timeline service, enabling asynchronous communication between them

People discovery

  • Recommends people based on the user's interest

Ads

  • Recommends text, images, or video-based promotional ads

API gateway

  • Routes traffic to different services or servers for the same service for load-balancing purposes
  • Implements many other functionalities like rate-limiting, basic authentication, TLS connection termination, etc.

Workflow#

Let's see how services integrate to enable users to perform different operations.

Tweet service: When a user posts a Tweet, it goes to the Tweet service via the API gateway. The Tweet service stores the Tweet's content in the database. However, in the case of a media file, the client first uploads it to the server and gets its media ID in response. After that, the client sends a request via the API gateway to the Tweet service to create a Tweet, including the recently obtained media ID. Then, the Tweet service pushes a copy of the Tweet to the pub-sub service. The pub-sub service notifies the timeline service that a new Tweet has arrived.

Let's see how the timeline service works after it gets triggered by the Tweets service for active and inactive users.

Timeline service:  The timeline service works differently for active and inactive users because generating the timeline (once the new Tweets come in) for inactive users is not useful. This is because users may not log into their accounts for a long period of time, so it makes sense to update the timeline on a need basis for inactive users. For active users, the timeline service generates the timeline beforehand in order to respond to the users in near real-time. The timeline service gets the followers of a user from the database, extracts their Tweets, and scores them based on their content relevancy to the user. The timeline service also interacts with the trend, people discovery, and ads services. These services return different types of suggestions based on user information. The timeline service aggregates the responses of these services and stores them in the feed cache.

When an active user requests a timeline, the API gateway forwards the request to the timeline service. The timeline service gets a timeline from the feed cache and returns it to the client via an API gateway. However, the timeline service has to interact with the other services (trend, people discovery, and so forth) and generate a timeline on a runtime for an inactive user.

Generating a timeline for inactive users
Generating a timeline for inactive users
Getting a timeline from the feed cache for active users
Getting a timeline from the feed cache for active users

As we can see from the above illustrations, the timeline service doesn't interact with other services and gets a timeline from the feed cache for active users, which incurs reduced latency.

Point to Ponder

Question

How does the timeline service handle timeline updates?

Hide Answer

There are two methods to get timeline updates:

  1. Pull model
  2. Push model

Pull model: This sends a request to the server after a specific interval and gets an updated timeline from the server. However, sending a request after a specific interval again and again might return empty responses.

Push model: In the push model, notifications are sent to the client from the server whenever a new Tweet is posted. However, if a user with millions of followers posts a Tweet, then notifications will be sent to millions of users, which is not an optimal approach.

Since both methods have their drawbacks, we can use a mixture of both models in the case of millions of followers, such as:

  1. For active followers, use the push model
  2. For inactive followers, use the pull model

Let us now dive into the design decisions.

Design considerations#

Let’s now discuss the vital design decisions for optimal communication between clients, the API gateway, and the downstream services.

Architectural style#

We will start with the interaction between the client and the API gateway.

Client to API gateway#

The Tweet service creates a Tweet, and the timeline service obtains a timeline based on the user's request. Meanwhile, users may update or delete their Tweets. In all of these, we're mainly performing CRUD operations on resources. Therefore, REST is the right choice for communication between the client and API gateway because it efficiently manages to create and read operations via HTTP verbs.

API architecture style for interaction between the client and the API gateway
API architecture style for interaction between the client and the API gateway

API gateway to back-end services#

The next interaction occurs between the API gateway and back-end services. The API gateway interacts with the Tweet and timeline services independently, which means that the API gateway is not interacting with multiple services for a requested operation. Moreover, the Tweet and timeline services perform CRUD operations only, and the API gateway does not fetch the customized data from the multiple services. Thus, REST is a suitable option to use between the API gateway and back-end services.

REST API architecture style between API gateway and back-end services
REST API architecture style between API gateway and back-end services

HTTP version#

HTTP version selection can be easy if we analyze our request and response. The request and response can consist of text-based Tweets or any media files. Also, the response can consist of links to images or videos, which the client can retrieve from the CDN using these links. Intuitively, it may seem like 1.1 is the obvious choice for the version of HTTP to use between the client and the server. Also, we can use the HTTP/1.1 version to upload a single file, as discussed in the File API Design Decisions lesson. However, using the Twitter API, a user can upload multiple media files simultaneously. Therefore, in situations where multiple files are to be posted to the API, multiplexing is highly desired. Also, the compression of headers will play a positive role, which seems to fit the profile of HTTP/2.0 instead.

Data formats#

Since the timeline contains various types of data, it makes sense to use multiple data formats to deliver the data effectively to the client. For instance, Tweets, portions of ads, trends, etc., can be communicated through the JSON data format. The images and videos in Tweets or ads should be transmitted in a binary format, both to and from the client. Also, the browser automatically compresses JSON into the binary data format because HTTP/2.0 transmits the data in binary.

Summary#

In this lesson, we discussed the complete workflow, from the client to Twitter's back-end services, and took different design decisions with regard to the API. These are summarized in the table below:

Design Considerations

Client-to-API Gateway

API Gateway-to-Backend

Architecture style

REST

REST

Data format

JSON, Binary

JSON, Binary

HTTP version

HTTP/2.0

HTTP/2.0

Requirements of the Twitter API

API Model for Twitter Service